Interface for eigensolver on QUDA #561

aniketsen · 2023-05-09T16:23:23Z

eigsolveQuda acts as interface for calling the eigensolver on QUDA. In phmc.c, this interface is called based on input parameter UseExternalEigSolver which is stored in monomial parameter external_eigsolver. The interface also initializes QUDA and initializes the two flavor solver.

… interface, added timing and more comments

…olver

kostrzewa · 2023-05-24T14:02:58Z

I tested this on a 32c64 lattice and it seems to work correctly as far as I can tell. However, I think the time to solution for the lowest eigenvalue can be reduced significantly. On the machine in question (single quad-A100 node), the solver takes around ~~200 seconds for the lowest eigenvalue and around 10 seconds for the largest one~~ (precision was too low, it's actually over 450 seconds in total) . The CPU solver, on just 2x16 AMD EPYC2 cores, takes "only" 1000 seconds by comparison. I thus suspect that one can speed up the calculation of the lowest eigenvalue by another factor of 10 or so at least. If you have the time and the inclination, @aniketsen, it would be great if you could explore the polynomial acceleration a little. The documentation of the eigensolver on the QUDA Wiki is not bad as far as I can tell.

My guess is that appropriate settings for a_min and a_max (depending on whether one is looking for the smallest or the largest eigenvalue) will speed this up massively. I'm not sure about the order of the polynomial that one should use (poly_deg).

kostrzewa · 2023-05-24T16:36:38Z

No poly accel:

# TM_QUDA: mu = 0.138390000000, epsilon = 0.146560000000 kappa = 0.140005000000, csw = 1.711200000000
# QUDA: Orthonormalising initial guess
# QUDA: ********************************
# QUDA: **** START QUDA EIGENSOLVER ****
# QUDA: ********************************
# QUDA: TRLM computed the requested 1 vectors in 412 restart steps and 19824 OP*x operations.
# QUDA: Eval[0000] = (+1.1364231284241534e-04,-5.9654390341346649e-22) ||+1.1364231284241534e-04|| Residual = +1.3907589993294887e-14
# QUDA: ********************************
# QUDA: ***** END QUDA EIGENSOLVER *****
# QUDA: ********************************
# TM_QUDA: Time for eigsolveQuda 4.568580e+02 s level: 2 proc_id: 0 /HMC/ndcloverrat1:ndrat_heatbath/eigsolveQuda
# TM_QUDA: Using single prec. as sloppy!
# TM_QUDA: Using double-half refinement in mshift-solver!
# TM_QUDA: Called _loadGaugeQuda for gauge_id: 0.000000
# TM_QUDA: Theta boundary conditions will be applied to gauge field
# TM_QUDA: Using mixed precision CG!
# TM_QUDA: Using EO preconditioning!
# TM_QUDA: Time for loadCloverQuda 1.272554e-02 s level: 3 proc_id: 0 /HMC/ndcloverrat1:ndrat_heatbath/eigsolveQuda/loadCloverQuda
# TM_QUDA: mu = 0.138390000000, epsilon = 0.146560000000 kappa = 0.140005000000, csw = 1.711200000000
# QUDA: Orthonormalising initial guess
# QUDA: ********************************
# QUDA: **** START QUDA EIGENSOLVER ****
# QUDA: ********************************
# QUDA: TRLM computed the requested 1 vectors in 4 restart steps and 240 OP*x operations.
# QUDA: Eval[0000] = (+3.0261608378473412e+00,-6.2242957455787945e-17) ||+3.0261608378473412e+00|| Residual = +1.0169756438652394e-14
# QUDA: ********************************
# QUDA: ***** END QUDA EIGENSOLVER *****
# QUDA: ********************************
# TM_QUDA: Time for eigsolveQuda 5.180968e+00 s level: 2 proc_id: 0 /HMC/ndcloverrat1:ndrat_heatbath/eigsolveQuda

smallest: 457 seconds
largest: 5 seconds

TOTAL: 462 seconds (compared to over 1000 seconds on the CPU for this particular ensemble and machine)

untuned poly accel:

(note that poly acceleration exchanges SR <-> LR)

# QUDA: ********************************
# QUDA: **** START QUDA EIGENSOLVER ****
# QUDA: ********************************
# QUDA: spectrum LR
# QUDA: tol 1.0000e-14
# QUDA: n_conv 1
# QUDA: n_ev 1
# QUDA: n_kr 96
# QUDA: polyDeg 128
# QUDA: a-min 0.001000
# QUDA: a-max 4.000000
# QUDA: Resizing kSpace to 144 vectors
# QUDA: 0000 converged eigenvalues at restart iter 0001
# QUDA: 0000 converged eigenvalues at restart iter 0002
# QUDA: Resizing kSpace to 145 vectors
# QUDA: 0001 converged eigenvalues at restart iter 0003
# QUDA: TRLM computed the requested 1 vectors in 3 restart steps and 192 OP*x operations.
# QUDA: RitzValue[0000]: (+7.9066095102425749e-01, +0.0000000000000000e+00) residual 4.0309354968575832e-17
# QUDA: Eval[0000] = (+1.1364231284240960e-04,+1.4837671236894835e-21) ||+1.1364231284240960e-04|| Residual = +1.7886132427934613e-15
# QUDA: ********************************
# QUDA: ***** END QUDA EIGENSOLVER *****
# QUDA: ********************************
# TM_QUDA: Time for eigsolveQuda 7.718927e+01 s level: 2 proc_id: 0 /HMC/ndcloverrat1:ndrat_heatbath/eigsolveQuda
[...]
# QUDA: ********************************
# QUDA: **** START QUDA EIGENSOLVER ****
# QUDA: ********************************
# QUDA: spectrum SR
# QUDA: tol 1.0000e-14
# QUDA: n_conv 1
# QUDA: n_ev 1
# QUDA: n_kr 96
# QUDA: polyDeg 128
# QUDA: a-min 0.001000
# QUDA: a-max 4.000000
# QUDA: Resizing kSpace to 144 vectors
# QUDA: 0000 converged eigenvalues at restart iter 0001
# QUDA: 0000 converged eigenvalues at restart iter 0002
# QUDA: Resizing kSpace to 145 vectors
# QUDA: 0001 converged eigenvalues at restart iter 0003
# QUDA: TRLM computed the requested 1 vectors in 3 restart steps and 192 OP*x operations.
# QUDA: RitzValue[0000]: (+7.9066095102425749e-01, +0.0000000000000000e+00) residual 4.0309354968575832e-17
# QUDA: Eval[0000] = (+1.1364231284240960e-04,+1.4837671236894835e-21) ||+1.1364231284240960e-04|| Residual = +1.7886132427934613e-15
# QUDA: ********************************
# QUDA: ***** END QUDA EIGENSOLVER *****
# QUDA: ********************************
# TM_QUDA: Time for eigsolveQuda 7.709653e+01 s level: 2 proc_id: 0 /HMC/ndcloverrat1:ndrat_heatbath/eigsolveQuda

It seems that with the chosen settings

  eig_param.use_poly_acc = QUDA_BOOLEAN_TRUE;
  eig_param.poly_deg = 128; 
  eig_param.a_min = 1e-3;
  eig_param.a_max = 4;

that the solve for the largest eigenvalue does not work. Likely one needs to disable poly accel for that. (since the solve time is super short, that's perfectly fine)

00000000 2.41792e-05 2.41792e-05 8.00000e-06 1.00000e+00

The timing reflects this fact:

smallest: 77 seconds
"largest": 77 seconds

TOTAL: 154 seconds

Proposal for default

double eigsolveQuda(..){
[...]
  eig_param.use_poly_acc = maxmin == 1 ? QUDA_BOOLEAN_FALSE : QUDA_BOOLEAN_TRUE;                                                           
  eig_param.poly_deg = 128; 
  eig_param.a_min = 1e-3;
  eig_param.a_max = 4; 
[...]

and we might need some new parameters to make the degree, a_min and a_max user-configurable.

# QUDA: Orthonormalising initial guess
# QUDA: ********************************
# QUDA: **** START QUDA EIGENSOLVER ****
# QUDA: ********************************
# QUDA: TRLM computed the requested 1 vectors in 3 restart steps and 192 OP*x operations.
# QUDA: RitzValue[0000]: (+7.9066095102425749e-01, +0.0000000000000000e+00) residual 4.0309354968575832e-17
# QUDA: Eval[0000] = (+1.1364231284240960e-04,+1.4837671236894835e-21) ||+1.1364231284240960e-04|| Residual = +1.7886132427934613e-15
# QUDA: ********************************
# QUDA: ***** END QUDA EIGENSOLVER *****
# QUDA: ********************************
# TM_QUDA: Time for eigsolveQuda 7.718751e+01 s level: 2 proc_id: 0 /HMC/ndcloverrat1:ndrat_heatbath/eigsolveQuda
[...]
# QUDA: Orthonormalising initial guess
# QUDA: ********************************
# QUDA: **** START QUDA EIGENSOLVER ****
# QUDA: ********************************
# QUDA: TRLM computed the requested 1 vectors in 4 restart steps and 240 OP*x operations.
# QUDA: Eval[0000] = (+3.0261608378473412e+00,-6.2242957455787945e-17) ||+3.0261608378473412e+00|| Residual = +1.0169756438652394e-14
# QUDA: ********************************
# QUDA: ***** END QUDA EIGENSOLVER *****
# QUDA: ********************************
# TM_QUDA: Time for eigsolveQuda 5.185351e+00 s level: 2 proc_id: 0 /HMC/ndcloverrat1:ndrat_heatbath/eigsolveQuda

smallest: 77 seconds
largest: 5 seconds

TOTAL: 82 seconds

I'm pretty sure that this can be improved further by lowering the polynomial degree and optimising a_min, as described here: https://github.com/lattice/quda/wiki/QUDA%27s-eigensolvers#using-chebyshev-in-your-computations

kostrzewa · 2023-05-24T16:50:56Z

If you have some time, maybe you can take a look at / merge #564 and try to make the parameters user-configurable and/or test around with some production ensembles at various lattice spacings if the hard-coded defaults are okay for now. We can certainly live with 80ish seconds for the eigensolver in the HMC.

first stab at using polynomial acceleration for the ND eigensolver

…urb global inv_param when changing precision, adjust indentation in one place

kostrzewa · 2023-06-09T13:21:50Z

quda_interface.c

-
-  eig_param.invert_param = &inv_param;
+
+  // need our own QudaInvertParam for passing the operator properties


@aniketsen note that I've added this just now because the change of precision is performed in the global inv_param, which might have unforeseen consequences in certain situations

this way, the global inv_param is never touched via the pointer

…tings user input for poly acc settings

…olver

…in order to allow all requested eigenvalues to be returned, if desired

… parameters

kostrzewa · 2023-09-16T07:33:52Z

@aniketsen slowly getting back into finalizing this and other things

As far as I can tell there is currently no way to actually build test_eigsolveQuda (I don't see any changes in the build system). In test_eigensolveQuda.c I'm also a bit confused by the set of monomials listed as ComputeEVFreq can only be set for a subset of these (from read_input.l):

2582 <NDPOLYMONOMIAL,CLPOLYMONOMIAL,NDRATMONOMIAL,NDRATCORMONOMIAL,NDCLRATMONOMIAL,NDCLRATCORMONOMIAL,RATMONOMIAL,RATCORMONOMIAL,CLRATMONOMIAL,CLRATCORMONOMIAL>
[...]
2597   {SPC}*ComputeEVFreq{EQL}{DIGIT}+ {                                                                                                  
2598     sscanf(yytext, " %[a-zA-Z] = %d", name, &a); 
2599     mnl->rec_ev = a;
2600     if(myverbose!=0) printf("  Frequency for computing EV's set to %d in line %d monomial %d\n", mnl->rec_ev, line_of_file,                current_monomial);
2601   }

If you want we can just leave test_eigsolveQuda out of this PR because it doesn't really offer the kind of stand-alone eigensolver interface that I was hoping for, rather, it focuses just on the min/max evals and I'm not sure whether it actually works correctly. When you cycle through the monomials:

361   for(j = 0; j < no_monomials; j++) {
362     if( (monomial_list[j].type == NDPOLY) || (monomial_list[j].type == NDDETRATIO)
363     || (monomial_list[j].type == NDCLOVER) || (monomial_list[j].type == NDRAT)
364     || (monomial_list[j].type == NDCLOVERRAT) || (monomial_list[j].type == NDRATCOR)
365     || (monomial_list[j].type == NDCLOVERRATCOR) || (monomial_list[j].type == NDCLOVERDETRATIO) ) {
366       if( (monomial_list[j].rec_ev != 0) ) {
367         monomial * mnl = &monomial_list[j];

you don't actually set any of the global parameters based on the monomial and instead immediately call eigsolveQuda. This means that things like g_kappa and g_mu will simply be unset, right?

If you look at something like ndrat_monomial.c, this is what happens before the eigensolver driver is called:

205 void ndrat_heatbath(const int id, hamiltonian_field_t * const hf) {
206   monomial * mnl = &monomial_list[id];
207   tm_stopwatch_push(&g_timers, __func__, mnl->name);
208   nd_set_global_parameter(mnl);
209   mnl->iter1 = 0;
210   if(mnl->type == NDCLOVERRAT) {
211     init_sw_fields();
212     sw_term((const su3**)hf->gaugefield, mnl->kappa, mnl->c_sw);
213     sw_invert_nd(mnl->mubar*mnl->mubar - mnl->epsbar*mnl->epsbar);
214     copy_32_sw_fields();
215   }
216   // we measure before the trajectory! 
217   if((mnl->rec_ev != 0) && (hf->traj_counter%mnl->rec_ev == 0)) {                                                                      
218     if(mnl->type != NDCLOVERRAT) phmc_compute_ev(hf->traj_counter-1, id, &Qtm_pm_ndbipsi);
219     else phmc_compute_ev(hf->traj_counter-1, id, &Qsw_pm_ndbipsi);
220   }

where nd_set_global_parameter is:

 52 void nd_set_global_parameter(monomial * const mnl) {
 53 
 54   g_mubar = mnl->mubar;
 55   g_epsbar = mnl->epsbar;
 56   g_kappa = mnl->kappa;
 57   g_c_sw = mnl->c_sw;
 58   boundary(g_kappa);
 59   phmc_cheb_evmin = mnl->EVMin;
 60   phmc_invmaxev = mnl->EVMaxInv;
 61   phmc_cheb_evmax = mnl->EVMax;
 62   phmc_Cpol = 1.;
 63   // used for preconditioning in cloverdetrat
 64   g_mu3 = 0.;
 65 
 66   return;
 67 }

aniketsen · 2023-09-19T11:31:44Z

As far as I can tell there is currently no way to actually build test_eigsolveQuda (I don't see any changes in the build system). In test_eigensolveQuda.c I'm also a bit confused by the set of monomials listed as ComputeEVFreq can only be set for a subset of these (from read_input.l):

yes, I did not push the changes in the build system. I had added all the monomials, for a more general approach, but yes, ComputeEVFreq is not defined for all of them. But in these cases the variable rec_ev is set to zero by default, and the eigensolver is simply not called.

If you want we can just leave test_eigsolveQuda out of this PR because it doesn't really offer the kind of stand-alone eigensolver interface that I was hoping for, rather, it focuses just on the min/max evals and I'm not sure whether it actually works correctly. When you cycle through the monomials:

yes, this is a mistake. For some reason I thought init_monomials also initialized the global parameters. But of course
one needs to call nd_set_global_parameter if one wants to cycle through all the different monomials.

You can just leave this script out. I agree that this does not really perform a very useful function at the moment, other than just calling the eigensolver for different monomials. But you can also easily get the same job done by setting ComputeEVFreq to 1 in the standard hmc.c script.

This reverts commit a46c4c6.

Aniket Sen and others added 17 commits March 13, 2023 15:58

initial commit, interface works with quda

489ee3d

moved initQudaforEig into phmc_compute_ev

ee79b0f

initialize QUDA and eigenvals from QUDA are scaled

7a48974

call eigensolver on QUDA if TM_USE_QUDA is defined

e030595

clean up and finalize eigparams for QUDA

14c4e9f

fix bug with the array storing the eigenvalues

cc247bd

revert changes in ndrat_monomial, eigenvalues_bi

f4cd278

add input option UseExternalEigSolver

af6f781

add parameter external_eigsolver to monomial

f50d533

eigsolveQuda is called based on mnl->external_eigsolver

40a1602

removed initQudaforEig, memory allocation for eigenvalues done in the…

3922b05

… interface, added timing and more comments

QUDA and monomial parameters are initialized within the interface

c308510

removed unnecessary quda_initialized check

a06e083

Merge remote-tracking branch 'origin/quda_work' into quda_work_eigens…

291e9bf

…olver

TM_USE_QUDA is checked before calling the interface

4b7e737

modifying eigenvalue_precision in the same fashion as eigenvalues_bi

a0d71d3

Merge remote-tracking branch 'origin/quda_work' into quda_work_eigens…

3919e5c

…olver

first stab at using polynomial acceleration for the ND eigensolver

f6d68c8

aniketsen and others added 2 commits June 7, 2023 10:26

Merge pull request #564 from etmc/quda_work_eigensolver_poly_acc

d1b1c42

first stab at using polynomial acceleration for the ND eigensolver

use own QudaInvertParam struct for eig_param.invert_param to not dist…

4e6033f

…urb global inv_param when changing precision, adjust indentation in one place

kostrzewa reviewed Jun 9, 2023

View reviewed changes

aniketsen and others added 6 commits June 17, 2023 10:51

user input for poly acc settings

3d67264

poly acc inputs updated

362c93b

Merge pull request #569 from etmc/quda_work_eigensolver_poly_acc_sett…

6954db9

…tings user input for poly acc settings

added support for one flavour solver

8bbabe6

fixed bug in setting eig_param.use_poly_acc

721f3ba

Merge remote-tracking branch 'origin/quda_work' into quda_work_eigens…

9c27ed5

…olver

kostrzewa and others added 5 commits July 19, 2023 13:40

require an output location to be provided externally to eigsolveQuda …

a8b0a62

…in order to allow all requested eigenvalues to be returned, if desired

slight modification of verbose readinput output for eigensolver input…

0626f45

… parameters

the eigenvalues computed by QUDA are complex numbers, of course

c5e8a13

document QUDA eigensolver for the HMC

4fe78b7

script for testing eigsolveQuda

a46c4c6

Revert "script for testing eigsolveQuda"

3bb6639

This reverts commit a46c4c6.

kostrzewa merged commit 49b0dea into quda_work Oct 4, 2023
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interface for eigensolver on QUDA #561

Interface for eigensolver on QUDA #561

aniketsen commented May 9, 2023 •

edited

Loading

kostrzewa commented May 24, 2023 •

edited

Loading

kostrzewa commented May 24, 2023

kostrzewa commented May 24, 2023 •

edited

Loading

kostrzewa Jun 9, 2023

kostrzewa Jun 9, 2023

kostrzewa commented Sep 16, 2023

aniketsen commented Sep 19, 2023 •

edited

Loading


		eig_param.invert_param = &inv_param;

		// need our own QudaInvertParam for passing the operator properties

Interface for eigensolver on QUDA #561

Interface for eigensolver on QUDA #561

Conversation

aniketsen commented May 9, 2023 • edited Loading

kostrzewa commented May 24, 2023 • edited Loading

kostrzewa commented May 24, 2023

No poly accel:

untuned poly accel:

Proposal for default

kostrzewa commented May 24, 2023 • edited Loading

kostrzewa Jun 9, 2023

Choose a reason for hiding this comment

kostrzewa Jun 9, 2023

Choose a reason for hiding this comment

kostrzewa commented Sep 16, 2023

aniketsen commented Sep 19, 2023 • edited Loading

aniketsen commented May 9, 2023 •

edited

Loading

kostrzewa commented May 24, 2023 •

edited

Loading

kostrzewa commented May 24, 2023 •

edited

Loading

aniketsen commented Sep 19, 2023 •

edited

Loading